Speakbysinging: Converting Singing Voices to Speaking Voices While Retaining Voice Timbre
نویسندگان
چکیده
This paper describes a singing-to-speaking synthesis system called “SpeakBySinging” that can synthesize a speaking voice from an input singing voice and the song lyrics. The system controls three acoustic features that determine the difference between speaking and singing voices: the fundamental frequency (F0), phoneme duration, and power (volume). By changing these features of a singing voice, the system synthesizes a speaking voice while retaining the timbre of the singing voice. The system first analyzes the singing voice to extract the F0 contour, the duration of each phoneme of the lyrics, and the power. These features are then converted to target values that are obtained by feeding the lyrics into a traditional text-to-speech (TTS) system. The system finally generates a speaking voice that preserves the timbre of the singing voice but has speech-like features. Experimental results show that SpeakBySinging can convert singing voices into speaking voices whose timbre is almost the same as the original singing voices.
منابع مشابه
Speech-to-Singing Synthesis System: Vocal Conversion from Speaking Voices to Singing Voices by Controlling Acoustic Features Unique to Singing Voices
Introduction: This paper introduces a speech-to-singing synthesis system, called SingBySpeaking, which can synthesize a singing voice, given a speaking voice reading the lyrics of a song and its musical score. The system is based on the speech manipulation system STRAIGHT and is comprised of four models controlling three acoustic parameters: the fundamental frequency (F0), phoneme duration, and...
متن کاملDevelopment of the F0 Control Model for Singing-Voices Synthesis
Fundamental frequency (F0) control models for singing voices are required to construct singing-voice synthesis systems that can generate natural singing-voices. This paper describes the development of an F0 control model for singing-voices synthesis. F0 fluctuations are revealed as characteristics that need to control the F0 contour of singing-voices by investigating how much they influence sin...
متن کاملOn Human Capability and Acoustic Cues for Discriminating Singing and Speaking Voices
In this paper, acoustic cues and human capability for discriminating singing and speaking voices are discussed to develop an automatic discrimination system for singing and speaking voices. Based on the results of preliminary subjective experiments, listeners discriminate between singing and speaking voices with 70.0% accuracy for 200-ms signals and 99.7% for one-second signals. Since even shor...
متن کاملVocal conversion from speaking voice to singing voice using STRAIGHT
A vocal conversion system that can synthesize a singing voice given a speaking voice and a musical score is proposed. It is based on the speech manipulation system STRAIGHT [1], and comprises three models controlling three acoustic features unique to singing voices: the F0, duration, and spectral envelope. Given the musical score and its tempo, the F0 control model generates the F0 contour of t...
متن کاملAnalysis of acoustic features affecting "singing-ness" and its application to singing-voice synthesis from speaking-voice
To construct a natural singing-voice synthesis system, it is important to adequately control acoustic features such as fundamental frequency (F0), spectrum shapes, and phoneme duration in the synthesis method. This paper reveals acoustic features affecting singing-voice perception by comparative analyzing singingand speaking-voices, and then proposes a transforming method from speaking-voice in...
متن کامل